AITopics | fast memory

Collaborating Authors

fast memory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

788d986905533aba051261497ecffcbb-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 01:12:14 GMT

fast memory, hm-ann, slow memory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Merced County > Merced (0.04)
North America > United States > Nevada (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > British Columbia > Vancouver (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Add feedback

Sequential and Shared-Memory Parallel Algorithms for Partitioned Local Depths

Devarakonda, Aditya, Ballard, Grey

arXiv.org Artificial IntelligenceJul-31-2023

In this work, we design, analyze, and optimize sequential and shared-memory parallel algorithms for partitioned local depths (PaLD). Given a set of data points and pairwise distances, PaLD is a method for identifying strength of pairwise relationships based on relative distances, enabling the identification of strong ties within dense and sparse communities even if their sizes and within-community absolute distances vary greatly. We design two algorithmic variants that perform community structure analysis through triplet comparisons of pairwise distances. We present theoretical analyses of computation and communication costs and prove that the sequential algorithms are communication optimal, up to constant factors. We introduce performance optimization strategies that yield sequential speedups of up to $29\times$ over a baseline sequential implementation and parallel speedups of up to $19.4\times$ over optimized sequential implementations using up to $32$ threads on an Intel multicore CPU.

algorithm, speedup, triplet, (15 more...)

arXiv.org Artificial Intelligence

2307.16652

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

A Theory of I/O-Efficient Sparse Neural Network Inference

Gleinig, Niels, Ben-Nun, Tal, Hoefler, Torsten

arXiv.org Artificial IntelligenceJan-3-2023

As the accuracy of machine learning models increases at a fast rate, so does their demand for energy and compute resources. On a low level, the major part of these resources is consumed by data movement between different memory units. Modern hardware architectures contain a form of fast memory (e.g., cache, registers), which is small, and a slow memory (e.g., DRAM), which is larger but expensive to access. We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units. In this paper, we provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference. We establish bounds that determine the optimal number of I/Os up to a factor of 2 and present a method that uses a number of I/Os within that range. Much of the I/O-complexity is determined by a few high-level properties of the FFNN (number of inputs, outputs, neurons, and connections), but if we want to get closer to the exact lower bound, the instance-specific sparsity patterns need to be considered. Departing from the 2-optimal computation strategy, we show how to reduce the number of I/Os further with simulated annealing. Complementing this result, we provide an algorithm that constructively generates networks with maximum I/O-efficiency for inference. We test the algorithms and empirically verify our theoretical and algorithmic contributions. In our experiments on real hardware we observe speedups of up to 45$\times$ relative to the standard way of performing inference.

artificial intelligence, machine learning, neuron, (20 more...)

arXiv.org Artificial Intelligence

2301.01048

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback